Recent Development of HMM-Based Expressive Speech Synthesis and Its Applications
نویسندگان
چکیده
This paper describes the recent development of HMM-based expressive speech synthesis. Although the expressive speech includes a wide variety of expressions such as emotions, speaking styles, intention, attitude, emphasis, focus, and so on, we mainly refer to the speech synthesis techniques for emotions and speaking styles, which would be the most primary expressions in human speech communication. We describe five core techniques, i.e., style modeling, style adaptation, style interpolation, style control, and style estimation. In addition, we also give a brief overview of other applications to expressive speech synthesis and recognition.
منابع مشابه
Speech enhancement based on hidden Markov model using sparse code shrinkage
This paper presents a new hidden Markov model-based (HMM-based) speech enhancement framework based on the independent component analysis (ICA). We propose analytical procedures for training clean speech and noise models by the Baum re-estimation algorithm and present a Maximum a posterior (MAP) estimator based on Laplace-Gaussian (for clean speech and noise respectively) combination in the HMM ...
متن کاملHmm-based Expressive Speech Synthesis —towards Tts with Arbitrary Speaking Styles and Emotions
This paper describes recent progress in our approach to generating expressive speech. A goal of text-to-speech (TTS) synthesis is to have an ability to generate natural sounding speech with arbitrary speaker’s voice characteristics, speaking styles and emotional expressions. To change voice and speaking style and/or emotion of the synthetic speech arbitrarily with maintaining its naturalness, i...
متن کاملImproving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM
Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...
متن کاملHierarchical stress modeling and generation in mandarin for expressive Text-to-Speech
Expressive speech synthesis has received increased attention in recent times. Stress (or pitch accent) is the perceptual prominence within words or utterances, which contributes to the expressivity of speech. This paper summarizes our contribution to Mandarin expressive speech synthesis. A novel hierarchical stress modeling and generation method for Mandarin is proposed and further integrated i...
متن کاملA novel irregular voice model for HMM-based speech synthesis
State-of-the-art text-to-speech (TTS) synthesis is often based on statistical parametric methods. Particular attention is paid to hidden Markov model (HMM) based text-to-speech synthesis. HMM-TTS is optimized for ideal voices and may not produce high quality synthesized speech with voices having frequent non-ideal phonation. Such a voice quality is irregular phonation (also called as glottaliza...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011